CAFE: aCcelerated Alignment-FrEe sequence analysis
نویسندگان
چکیده
Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE.
منابع مشابه
Integrating GPU-Accelerated Sequence Alignment and SNP Detection for Genome Resequencing Analysis
DNA sequence alignment and single-nucleotide polymorphism (SNP) detection are two important tasks in genomics research. A common genome resequencing analysis workflow is to first perform sequence alignment and then detect SNPs among the aligned sequences. In practice, the performance bottleneck in this workflow is usually the intermediate result I/O due to the separation of the two components, ...
متن کاملA Novel Method for Comparative Analysis of DNA Sequences by Ramanujan-Fourier Transform
Alignment-free sequence analysis approaches provide important alternatives over multiple sequence alignment (MSA) in biological sequence analysis because alignment-free approaches have low computation complexity and are not dependent on high level of sequence identity. However, most of the existing alignment-free methods do not employ true full information content of sequences and thus can not ...
متن کاملA New Alignment Free Method for Phylogenetic Tree Construction
In this paper various methods of sequence analysis which include the alignment based and alignment free methods of tree generation are reviewed and these find distance/similarity among the sequences of different species. Alignment free method based on tuple count and set theory is proposed and the results are compared with the guide tree obtained using alignment based method. The proposed metho...
متن کاملAlignment-Free Sequence Analysis and Applications
Genome and metagenome comparisons based on large amounts of next generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads. Alignment-free approaches based on the counts of word patterns in NGS data do not depend on the complete genome and are generally computationally efficient. Thus, they ...
متن کاملAn Application of the ABS LX Algorithm to Multiple Sequence Alignment
We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...
متن کامل